Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.
For the best experience please use the latest Chrome, Safari or Firefox browser.
RDBMS
NoSQL
redis 127.0.0.1:6379> SET name "Henrik"
OK
redis 127.0.0.1:6379> GET name
"Henrik"
redis 127.0.0.1:6379> SET age "43"
OK
redis 127.0.0.1:6379> GET age
"43"
Use cases
Cache. Session cache.
In-memory, low latency computing. (Write heavy.)
Recommendation engines & Machine Learning.
Queue
One more thing...
Redis complex data types: lists, sets, maps and streams.
CREATE TABLE people (id UUID PRIMARY KEY, firstname text, lastname text);
INSERT INTO people (id, lastname, firstname)
VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'Ingo','Henrik');
SELECT lastname, firstname FROM people WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;
Use cases
Large (aka Web Scale) +100TB databases
Write optimized storage engine
Write availability (Dynamo HA)
One more thing...
Useful secondary indexes: See Cassandra 4.0 and DataStax Enterprise 6.8.3.
> db.somecollection.insert({firstname: "Henrik", lastname: "Ingo", age: 42})
> db.somecollection.createIndex({lastname:1, firstname:1});
> db.somecollection.find({lastname: "Henrik"})
{_id: ObjectId("507f1f77bcf86cd799439011"), firstname: "Henrik",
lastname: "Ingo", age: 42}
Use cases
General purpose database. Competes with RDBMS.
Main selling points compared to relational:
JSON API, flexible schema, sharding.
Flexible schema strengths: Data hub.
What does the future look like...
Incremental innovation? Performance, GUI tools, integrations, SDKs...
gremlin> graph = TinkerFactory.createModern()
==> tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==> graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('name','marko').out('knows').values('name')
==> 'vadas'
==> 'josh'
Use cases
Analytical. Find friends of friends that own a cat
Social media, recommendation engines, etc.
National security
One more thing...
Gremlin, Cypher, GraphQL
OLTP graph databases exist. (Datastax)
Interesting unsolved problem: Optimal sharding for graph DBs.
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("Froscon demo").getOrCreate()
import spark.implicits._
val df = spark.read.json("people.json")
df.createOrReplaceTempView("people")
val sqlDF = spark.sql("SELECT * FROM people")
sqlDF.show()
+-----------+----------+-----+
| firstname | lastname | age |
+-----------+----------+-----+
| Henrik | Ingo | 43 |
+-----------+----------+-----+
Use cases
Data lake. S3.
Personalized user profile
Fraud detection, national security...
"Reporting"
One more thing...
Spark Streaming (mini-batch)
AWS Athena = Presto
> curl -POST http://localhost:9200/froscon/people/id1 -curl
-H 'Content-Type: application/json' -d '{"name":"Henrik Ingo"}'
> curl -XGET localhost:9200/froscon/_search?q=name:Ingo
[{_index: "froscon", _type: "people", _id: id1, _source:
{_id: "name":"Henrik Ingo"}
Use cases
Google for your website
Queries beyond the typical RDBMS BTree
Kibana analytics
Security monitoring
One more thing...
Elastic = MongoDB in size
Until 2018
Apache/BSD | *GPL | open core | proprietary | |
Key-Value | Memcache | Redis | ||
Wide Col | Cassandra | BigTable, DynamoDB | ||
Document | MongoDB | MarkLogic | ||
Graph | Neo4j | DSE Graph | ||
Query Eng | Spark, Presto | Athena | ||
Search | Lucene, Solr | Elastic |
After 2018
Apache/BSD | *GPL | open core | proprietary | |
Key-Value | Memcache | Redis | ||
Wide Col | Cassandra | BigTable, DynamoDB | ||
Document | MongoDB | |||
Graph | Neo4j | |||
Query Eng | Spark, Presto | Athena | ||
Search | Lucene, Solr, Open Distro | Elastic |
Image credits: